Data Bootcamp Final Project - Women in Workplace

Team: Hanbei Shi (Julia) Jiayu Zhou (Haley)


In [1]:
%matplotlib inline    
import sys                             # system module
import numpy as np                     # scientific computing
import pandas as pd                    # data package
import matplotlib as mpl
import matplotlib.pyplot as plt        # graphics module

Introduction

Whether it is the women’s march, or the ‘fearless girl’ status staring down Wall Street’s bull, there has been a national conversation on women’s rights and gender equality. As aspiring female professionals, gender diversity in the workplace is an important personal priority for us. Clearly there’s important work to be done, and this starts with a greater awareness of the problems.

In Part I, we decided to look at the different ways men and women inhabit the world of work. Also, we often hear working mothers say they worry about balancing work and family, suggesting they expect to face more challenges than their male counterparts or are doing a different cost-benefit analysis. In Part II, we explore the link between women's work commitments and their family commitments, and lastly recommend what can be done to improve equality.

Part I: Discover Trends and Identify Problems

We use both facts over time and latest annual data to analyze the differences between women and men in terms of labor force participation, median salaries, education attainment.

1. Civilian Labor Force by Sex

  • Data: Civilian labor force by sex (1948-2015 annual averages)
  • Source: The Women's Bureau, U.S. Department of Labor
  • Purpose: Check to see the breakdown of annual labor force size by women and men across time
  • Tools: Pandas, Matplotlib

In [5]:
#Step 1: Input Data

import pandas as pd               #Use pandas to read data into Python from our computers. 

path = '/Users/Haley/Desktop/Final_Project_Data.xlsx'      #Read data with the complete path

sheet1 = pd.read_excel(path,
                sheetname='Civilian Labor Force by Sex',
                skip_footer =7,
                index_col = 0)

print('Data types:\n\n', sheet1.dtypes,sep='')
print('Dimensions:', sheet1.shape)

sheet1.head()


Data types:

Number of women in the civilian labor force (in thousands)      int64
Number of men in the civilian labor force (in thousands)        int64
Share of the civilian labor force who are women (percent)     float64
Share of the civilian labor force who are men (percent)       float64
dtype: object
Dimensions: (68, 4)
Out[5]:
Number of women in the civilian labor force (in thousands) Number of men in the civilian labor force (in thousands) Share of the civilian labor force who are women (percent) Share of the civilian labor force who are men (percent)
1948 17335 43286 28.6 71.4
1949 17788 43498 29.0 71.0
1950 18389 43819 29.6 70.4
1951 19016 43001 30.7 69.3
1952 19269 42869 31.0 69.0

In [6]:
#Step 2: Draw graphs

fig, ax = plt.subplots(2, 1, figsize=(8,8))      # create fig and ax objects
sheet11 = sheet1[['Number of women in the civilian labor force (in thousands)',
                  'Number of men in the civilian labor force (in thousands)']] 
sheet11.plot(ax=ax[0], 
        kind='line',                 # line plot 
        color=['red', 'green'],   # line color 
        alpha=0.65)
ax[0].legend(['Number of women in the civilian labor force', 
              'Number of men in the civilian labor force'],
             fontsize=8, 
             loc=0)

ax[0].set_ylabel('Number in the civlian force in thousands')
ax[0].set_xlabel('Date')
ax[0].set_ylim(0)
ax[0].set_title('Civilian Labor Force by Sex (1948-2015)', fontsize=14, loc='left')

sheet12=sheet1[['Share of the civilian labor force who are women (percent)',
                  'Share of the civilian labor force who are men (percent)']] 
sheet12.plot(ax=ax[1], 
        kind='line',                 # line plot 
        color=['red', 'green'],   # line color 
        alpha=0.65)

ax[1].legend(['% of women in the civilian labor force', 
              '% of men in the civilian labor force'],
             fontsize=8, 
             loc=0)

ax[1].set_ylabel('% in the civlian force')
ax[1].set_xlabel('Date')
ax[1].set_ylim(0)

ax[0].spines["top"].set_visible(False)    
ax[0].spines["bottom"].set_visible(False)    
ax[0].spines["right"].set_visible(False)    
ax[0].spines["left"].set_visible(False)
ax[1].spines["top"].set_visible(False)    
ax[1].spines["bottom"].set_visible(False)    
ax[1].spines["right"].set_visible(False)    
ax[1].spines["left"].set_visible(False)


Mini-summary:

Based on the data analysis above, we are glad to see women account for nearly half of the U.S. labor force. In 1948, 28.6% of the labor force at the time with ages 16 and older were women. That share rose steadily and peaked at 46.9% in 2012. As of 2015 (the most recent data available provided by the Women's Bureau, U.S. Department of Labor), 46.8% of women were in the labor force, only 6.4 percentage points lower than the share for men (53.2%).

2. Labor Force Participation Rate by Sex

  • Data: Labor Force Participation Rate by Sex (1948-2015 annual averages)
  • Source: The Women's Bureau, U.S. Department of Labor
  • Purpose: Check to see the differences in workforce participation rates between women and men across time
  • Tools: Pandas, Matplotlib

In [7]:
#Step 1: Input Data

sheet2 = pd.read_excel(path, 
                       sheetname='Labor Force Participation Rate',
                       skiprows = 1,
                       index_col = 0,
                       usecols =(range(3))  #only need the first three cols
                       )

sheet2


Out[7]:
All Women All Men
1948.0 32.7 86.6
1949.0 33.1 86.4
1950.0 33.9 86.4
1951.0 34.6 86.5
1952.0 34.7 86.3
1953.0 34.4 86.0
1954.0 34.6 85.5
1955.0 35.7 85.4
1956.0 36.9 85.5
1957.0 36.9 84.8
1958.0 37.1 84.2
1959.0 37.1 83.7
1960.0 37.7 83.3
1961.0 38.1 82.9
1962.0 37.9 82.0
1963.0 38.3 81.4
1964.0 38.7 81.0
1965.0 39.3 80.7
1966.0 40.3 80.4
1967.0 41.1 80.4
1968.0 41.6 80.1
1969.0 42.7 79.8
1970.0 43.3 79.7
1971.0 43.4 79.1
1972.0 43.9 78.9
1973.0 44.7 78.8
1974.0 45.7 78.7
1975.0 46.3 77.9
1976.0 47.3 77.5
1977.0 48.4 77.7
... ... ...
1987.0 56.0 76.2
1988.0 56.6 76.2
1989.0 57.4 76.4
1990.0 57.5 76.4
1991.0 57.4 75.8
1992.0 57.8 75.8
1993.0 57.9 75.4
1994.0 58.8 75.1
1995.0 58.9 75.0
1996.0 59.3 74.9
1997.0 59.8 75.0
1998.0 59.8 74.9
1999.0 60.0 74.7
2000.0 59.9 74.8
2001.0 59.8 74.4
2002.0 59.6 74.1
2003.0 59.5 73.5
2004.0 59.2 73.3
2005.0 59.3 73.3
2006.0 59.4 73.5
2007.0 59.3 73.2
2008.0 59.5 73.0
2009.0 59.2 72.0
2010.0 58.6 71.2
2011.0 58.1 70.5
2012.0 57.7 70.2
2013.0 57.2 69.7
2014.0 57.0 69.2
2015.0 56.7 69.1
NaN 12.4 NaN

69 rows × 2 columns


In [9]:
#Step 2: Draw a graph

plt.plot(sheet2.index, sheet2['All Women']) 
plt.plot(sheet2.index, sheet2['All Men']) 

plt.title('Labor Force Partipation Rate by Sex', fontsize=14, loc='left')          # add title
plt.ylabel('Labor Force Participation Rate')                    # y axis label 
plt.xlabel('Year')   # y axis label


Out[9]:
<matplotlib.text.Text at 0x1153baa58>

Mini-summary

Given the graph above, we can see that today a majority of American women are in the labor force. In 1948, 32.7% of women ages 16 and older were in the labor force. That share rose steadily and peaked at 60% in 1999. As of 2015, only 12.4 percent points lower than the share for men.

3. Bad News: Median annual earnings by sex

  • Data: Median Annual Earnings by Sex (1960-2014 annual averages)
  • Source: The Women's Bureau, U.S. Department of Labor
  • Purpose: Check to see the differences in annual earnings between women and men across time
  • Tools: Pandas, Matplotlib

In [10]:
#Step 1: Input Data

sheet3 = pd.read_excel(path, 
                       sheetname='Median annual earnings by sex',
                       skiprows = 2,
                       index_col = 0,
                       usecols =(range(3))  #only need the first three cols
                       )
sheet3


#To do list
#Draw a line graph
#calculate the difference in year 1960 and the difference in year 2014


Out[10]:
All Women All Men
1960 22792 37565
1961 22967 38764
1962 23406 39472
1963 23852 40464
1964 24493 41409
1965 25168 41999
1966 25228 43833
1967 25729 44526
1968 26589 45721
1969 28424 46984
1970 28972 48801
1971 29164 49010
1972 29884 51649
1973 30182 53294
1974 30189 51382
1975 30033 51061
1976 30651 50921
1977 30679 52067
1978 31149 52403
1979 30888 51771
1980 30640 50930
1981 29985 50621
1982 30666 49665
1983 31467 49481
1984 32088 50407
1985 32794 50785
1986 33465 52069
1987 33725 51743
1988 33868 51277
1989 34612 50401
1990 34836 48643
1991 34853 49891
1992 35351 49941
1993 35098 49074
1994 35104 48777
1995 34729 48621
1996 35637 48313
1997 36741 49542
1998 37541 51306
1999 37404 51724
2000 37752 51210
2001 39066 51180
2002 39745 51886
2003 39548 52348
2004 39154 51131
2005 38620 50171
2006 38179 49623
2007 40080 51511
2008 39305 50985
2009 40030 52001
2010 40055 52068
2011 39073 50740
2012 38967 50936
2013 39798 50852
2013* 39428 50833
2014 39621 50383

In [11]:
#Step 2: Draw a graph


sheet3.plot(title='Median Annual Earnings by Sex', color=['r','g'])


Out[11]:
<matplotlib.axes._subplots.AxesSubplot at 0x11866f160>

Mini-summary

Given the graph above, we can see females had historically been consistently paid less than our male conterparts. While the gap has narrowed, the median annual earnings by women is still around $10,000 less than those of men. Sadly, men had outearned women since 1960, this is still the case TODAY.

4. Possible Reason: Educational Attainment

  • Data: Educational Attainment by Sex (2015)
  • Source: The Women's Bureau, U.S. Department of Labor
  • Purpose: Check to see the differences in labor force participation rates between women and men across educational levels
  • Tools: Pandas, Matplotlib

In [14]:
#Step 1: Input Data

sheet4 = pd.read_excel(path, 
                       sheetname='Participation Rate by Edu Sex',
                       skip_footer = 6,
                        index_col = 0
                       )

sheet4=sheet4[['Women','Men']]

In [15]:
#Step 2: Draw a graph

sheet4.plot(figsize=(17,6), ylim=(0,100), kind='bar', color=['red','g'],alpha=0.5,
            title='Participation Rate by Edu Sex')


Out[15]:
<matplotlib.axes._subplots.AxesSubplot at 0x1189a87b8>

Mini-summary

Given the bar chart, we can see that in each category, women are less likely to be employed or actively looking for jobs even when we have similar levels of education compared to our male counterparts.

Only 32.3% of women who have less than a high school diploma work, while 58.3% of men with the same level of education do. Even at the top, 72.2% of women that have advanced degrees including master, professioanl or doctoral degrees work, yet 77.8% of men in the same category work.

What is preventing the 5.6% of women that are just as intelligent, capable and competent from performing work and holding positions? What is holding them back? This brings us to the next possible reason.

Part II: Focus on Working Mothers and Deep Dive

Again, the insight we derived from the previous section tells us that even when men and women have similar levels of education, women are less likely to be employed or actively looking for jobs. We ask: Why is it that at every level of education, women fall behind in terms of participation in the labor force?

We often hear women say they worry about balancing work and family, suggesting they expect to face more challenges than their male counterparts or are doing a different cost-benefit analysis.

What we would like to focus on in the scope of this analysis is the link between women's work commitments and their family comitments.

Our Approach:

Women do the majority of housework and child care. We ask: at every stage in their careers, do women still perform more housework and childcare than men? Is there a link between the amount of work women do at home and their career ambition? In order to uncover insights, we explore the following datasets below:

  1. Whether being a mother reduces a woman's commitment to work?
  2. Whether being a mother prevents a woman from getting work?
  3. Whether the amount of hours spent on childcare contributes to women's lack of advancement in the workplace?

1. Employed Parents by Full- and Part-Time Status, Sex and Age of Youngest Child

  • Data: Employed Parents by Full- and Part-Time Status, Sex and Age of Youngest Child (2015)
  • Source: The Women's Bureau, U.S. Department of Labor
  • Purpose: Check to see if mothers are more likely than fathers to switch to parttime positions to care for children
  • Tools: Pandas, Matplotlib

In [16]:
#Step 1: Input Data
sheet5 = pd.read_excel(path, 
                       sheetname='Employed parents by status',
                       skip_footer = 4,
                       skiprows=1,
                       index_col = 0,
                       #usecols=['Age of youngest child','Percent of total employed of mothers','Percent of total employed of fathers']
                    # sheet5["Type"] == 'Full-time' 
                      )

sheet16 = sheet5[sheet5['Type'] == 'Full-time']
sheet16


Out[16]:
Type Percent of total employed of mothers Percent of total employed of fathers
Age of youngest child
under 3 years Full-time 72.8 94.4
3 to 5 years Full-time 74.6 95.7
6 to 17 years Full-time 77.7 95.8
under 18 years Full-time 76.0 95.4

In [17]:
sheet17 = sheet5[sheet5['Type'] == 'Part-time']
sheet17


Out[17]:
Type Percent of total employed of mothers Percent of total employed of fathers
Age of youngest child
under 3 years Part-time 27.2 5.6
3 to 5 years Part-time 25.4 4.3
6 to 17 years Part-time 22.3 4.2
under 18 years Part-time 24.0 4.6

In [29]:
fig, ax = plt.subplots(2, 1, figsize=(14,14))      # create fig and ax objects

sheet16.plot(ax=ax[0], 
        kind='bar',                 # line plot 
        color=['purple', 'yellow'],   # line color 
        alpha=0.5, width=0.4)
ax[0].legend(['Mothers', 
              'Fathers'],
             fontsize=10, 
             loc='center')

ax[0].set_ylabel('Percent of total employed')
ax[0].set_xlabel('Age of youngest child')
ax[0].set_ylim(0)
ax[0].set_title('Employed parents by full-time status, sex and age of youngest child, 2015 annual averages', fontsize=10, loc='left')



sheet17.plot(ax=ax[1], 
        kind='bar',                 # line plot 
       color=['purple', 'yellow'],   # line color 
        alpha=0.5, width=0.4)
ax[1].legend(['Mothers', 
              'Fathers'],
             fontsize=10, 
             loc='center')

ax[1].set_ylabel('Percent of total employed')
ax[1].set_xlabel('Age of youngest child')
ax[1].set_ylim(0)
ax[1].set_title('Employed parents by part-time status, sex and age of youngest child, 2015 annual averages', fontsize=10, loc='left')


ax[0].spines["top"].set_visible(False)    
ax[0].spines["right"].set_visible(False)    
ax[0].spines["left"].set_visible(False)
ax[1].spines["top"].set_visible(False)       
ax[1].spines["left"].set_visible(False)



In [28]:
fig, ax = plt.subplots(figsize=(12,4))



sheet16.plot(ax=ax, 
        kind='bar',                 # line plot 
        color=['purple', 'yellow'],   # line color 
        alpha=0.5, width=0.4)

ax.set_ylabel('Employment rates')
ax.set_xlabel('Age of the youngest child')
ax.set_ylim(0)
ax.set_title('Employment Rate of parents', fontsize=14)
ax.legend(fontsize=8, 
             loc=0)
ax.spines["top"].set_visible(False)


Mini-summary

The younger the child is, the more time that mothers seem to dedicate to the family. Most mothers begin to switch back into full-time employment as their child ages; however, once the children reach the age of college, mothers tend to pick lighter workload again. This shows that mothers are usually the ones in the family who change career paths to cater to the needs of the family.

2. Unemployment Rates of Parents by Sex and Age of Youngest Child

  • Data: Unemployment Rates of Parents by Sex and Age of Youngest Child (2015 Annual Averages)
  • Source: The Women's Bureau, U.S. Department of Labor
  • Purpose: Check to see if mothers are more likely than fathers to face unemployment
  • Tools: Pandas, Matplotlib

In [4]:
#Step 1: Input Data
sheet6 = pd.read_excel(path, 
                       sheetname='Unemployment Rate of parents',     
                        skip_footer = 4,
                      index_col = 0)
sheet6


Out[4]:
Unemployment rates of mothers Unemployment rates of fathers
Age of youngest child
under 3 years 6.6 3.7
3 to 5 years 6.0 3.7
6 to 17 years 4.5 3.0
under 18 years 5.3 3.3

In [8]:
#Step 2: Draw a graph

fig, ax = plt.subplots(figsize=(12,4))

sheet6.plot(ax=ax, 
        kind='bar',                 # line plot 
        color=['purple', 'yellow'],   # line color 
        alpha=0.5, width=0.4)

ax.set_ylabel('Unemployment rates')
ax.set_xlabel('Age of the youngest child')
ax.set_ylim(0)
ax.set_title('Unemployment Rate of parents', fontsize=14)
ax.legend(fontsize=8, 
             loc=0)
ax.spines["top"].set_visible(False)    
 
#if column == "under 3 years":    
    #    y_pos += 0.5


Mini-summary

We can derive the following insights from the graph above:

  1. Being a mother is more likely to face unemployment than being a father, this is especially when the youngest child is under 3 years old.
  1. Unemployment rates for mothers tend to drop as the youngest child in the household becomes older (with the last category as an exception), which indicates there is a poential link between the amount of energy and time invested in childcare and women's ability to excel in the workplace.

3. Employed Americans' Time Use Survey by Sex

  • Data: Time spent in primary activities for the civilian population 18 years and over by presence and age of youngest household child and sex, 2015 annual averages, employed
  • Source: Bureau of Labor Statistics
  • Purpose: Check to see if working mothers spend more time on childcare activities than working dads
  • Tools: Pandas, Matplotlib

In [30]:
#Step 1: Import Data
sheet7 = pd.read_excel(path, 
                       sheetname='Table 8B Important',     
                       skiprows = 3,
                       skip_footer = 4,
                      index_col = 0)
sheet71=sheet7[['Men.1','Women.1']]
sheet71 = sheet71.rename(columns={'Men.1': 'Men', 'Women.1': 'Women'})
sheet71=sheet71.iloc[[4, 5, 6, 13, 12,16], :]
sheet71


Out[30]:
Men Women
Activity
Household activities 1.18 1.88
Housework 0.25 0.73
Food preparation and cleanup 0.34 0.84
Caring for and helping household children 1.13 1.91
Caring for and helping household members 1.29 2.24
Working and work-related activities 6.62 4.61

In [31]:
#Step 2: Draw a graph

fig, ax = plt.subplots(figsize=(12,4))



sheet71.plot(ax=ax, 
        kind='bar',                 # line plot 
        color=['purple', 'yellow'],   # line color 
        alpha=0.5, width=0.4)

ax.set_ylabel('Average Hours per Day ')
ax.set_ylim(0)
ax.set_title('American Use of Time by Sex When the Youngest Child is Under 6', fontsize=14)
ax.legend(['Men', 'Women'],fontsize=8, 
             loc='best')
ax.spines["top"].set_visible(False)



In [32]:
#Step 1: Import Data
sheet7 = pd.read_excel(path, 
                       sheetname='Table 8B Important',     
                       skiprows = 3,
                       skip_footer = 4,
                      index_col = 0)
sheet72=sheet7[['Men.2','Women.2']]
sheet72 = sheet72.rename(columns={'Men.2': 'Men', 'Women.2': 'Women'})
sheet72=sheet72.iloc[[4, 5, 6, 13, 12,16], :]

sheet72


Out[32]:
Men Women
Activity
Household activities 1.24 1.90
Housework 0.19 0.70
Food preparation and cleanup 0.33 0.81
Caring for and helping household children 0.44 0.68
Caring for and helping household members 0.59 0.93
Working and work-related activities 6.54 5.37

In [33]:
#Step 2: Draw a graph

fig, ax = plt.subplots(figsize=(12,4))

sheet72.plot(ax=ax, 
        kind='bar',                 # line plot 
        color=['purple', 'yellow'],   # line color 
        alpha=0.5, width=0.4)

ax.set_ylabel('Average Hours per Day ')
ax.set_ylim(0)
ax.set_title('American Use of Time by Sex When the Youngest Child is Between 6 to 17', fontsize=14)
ax.legend(['Men', 'Women'],fontsize=8, 
             loc='best')
ax.spines["top"].set_visible(False)


Conclusion: People who do more work at home are less interested in becoming top executives

Even though gender inequality seems to have improved over the past decades, the expection on women taking care of the family still perpetuates in the society. Voluntarily or relunctantly, women change their career path for the needs of the family which means that they naturally have a lower chance to achieve the equal amount of success as men. At every stage in their careers, women do more housework and child care than men—and there appears to be a link between the amount of work people do at home and their career ambition. We can conclude that the more work mothers do at home, the less interested they are in maintaining or succeeding in a career.

Recommendation: Female employees need the fexibility to fit work into their lives

Each one of us has an important role to play, from talking more often and openly about gender diversity to modeling our commitment in our everyday actions. Our research confirmed the notion that women have long been viewed as the caretakers in the families and there is a negative correlation between the amount of housework they do and the amount of success they achieve in the workplace. Our recommendation is that fostering equality is not always about making big gestures, it could start right at the comfort of one's home by having men perform their fair share of housework.

References